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(si) Shared resource control in a distributed processing system. 



(fS7) A system and method for determining a mas- 
ter process for control of a shared system re- 
source. The improved system requires the 
master process to hold exclusive access on a 
shared resource control file only intermittently. 
The master process periodically updates the 
shared resource control file with a new times- 
tamp. Processes seeking resource access read 
the shared control file and determine whether 
another process has been designated master. If 
the interval since the latest timestamp is greater 
than a preset staleness interval, the shared 
control file is discarded and a new one created 
by the accessing process. 
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The present invention relates to the operation of 
distributed processing computer systems. In particu- 
lar it relates to those systems that have a plurality of 
processing nodes each one having access to a num- 
ber of shared resources and which require apparatus 
arid methods for managing access to the shared re- 
sources. Still more particularly, the present invention 
relates to the management of a shared control file 
that designates one of a number of distributed proc- 
esses as the master process for controlling access to 
that shared resource. 

Distributed computer systems are created by 
linking a number of computer systems using a com- 
munications network. Distributed systems frequently 
have the ability to share data resident on individual 
systems. Replicated data systems implement data 
sharing by providing a replica copy of a data object to 
each process using that data object. Replication re- 
duces the access time for each processor by elimin- 
ating the need to send messages over the network to 
retrieve and supply the necessary data. A replicated 
object is a logical unit of data existing in one of the 
computer systems but physically replicated to multi- 
ple distributed computer systems. Replicated copies 
are typically maintained in the memories of the dis- 
tributed systems. 

Replicated data objects also speed the update 
process by allowing immediate local update of a data 
object. Replication introduces a control problem, 
however, because many copies of the data object ex- 
ist The distributed system must have some means for 
controlling data update to ensure that all copies of the 
data remain consistent. 

Prior art systems control data consistency by es- 
tablishing a master data object copy in one of the dis- 
tributed systems. The master copy is always as- 
sumed to be valid. Data object update by a system 
ther than that of the master copy requires sending 
the update request to the master for update and prop- 
agation to all replicas. This approach has the disad- 
vantage of slowing local response time as the master 
data object update and propagation are performed. 

Another means for controlling replicated data is 
described in Moving Write Lock for Replicated Ob- 
jects , commonly assigned, filed on October 16, 1992 
as US patent application Serial Number 07/961,757 
and having attorney docket number AT992-046. This 
document which does not form part of the state of the 
art as defined in article 54(2) and 54(3) EPC de- 
scribes an apparatus and method that require that a 
single "write lock 0 exist in a distributed system and be 
passed to each process on request. Data object up- 
dates can only be performed by the holder of the 
"write lock." The "write lock" holder may update the lo- 
cal object copy and then send that update to the mas- 
ter processor for its update and propagation to other 
processes. A copy of the above patent application is 
filed herewith. 



The method for determining which of a number of 
distributed processes is to be master is described in 
pending US patent application serial number 
07/961, 750 filed October 16, 1992 and entitled Deter- 
5 mining a Winner of a Race in a Data Processing Sys- 
tem, commonly assigned and bearing attorney docket 
numberAT992-117. This document does not form part 
of the state of the art as defined by articles 54(2) and 
54(3) EPC. The "race" between each process poten- 
10 tially controlling a resource results in the assignment 
of master status to the process f irst establishing write 
control over a Share Control File. Once control has 
been established by one process, other processes 
are assigned "shadow" status. Master process failure 
is causes reevaluation of master status. A copy of this 
patent application is filed herewith. 

The technical problem addressed by the present 
invention is providing fault-tolerant features to a dis- 
tributed processing system using write lock rrianage- 
20 ment of replicated data objects. Fault tolerance is re- 
quired to ensure that no data or updates are lost due 
to the failure of a master process. Prior art systems 
and those systems referenced above, require the 
master determination and write lock control to be re- 
25 initialized. This could result in loss of data if a locally 
updated data object replica has not been propagated 
to the master or other replicas. 

Accordingly, in a first aspect of the invention, 
there is provided a method of determining a master 
30 process for control of a shared resource in a computer 
system having a plurality of processes operating on 
at least one processor that has a memory and access 
to a shared data storage means, the method compris- 
ing: testing said shared data storage means for the 
35 presence of a shared resource control file for said 
shared resource; if nof He exists, creating a shared re- 
source control file in said shared data storage means, 
writing master process identification information to 
said shared resource control file, and writing a time- 
40 stamp to said control file; if a file exists, requesting 
exclusive access to said file; if access denied, waiting 
and retrying; if access granted, determining the dif- 
ference between current time and the last time 
stamp; if said difference is less than a f irst preset in- 
45 terval, designating the requesting process as a shad- 
ow process; if said difference is greater than said first 
preset interval, discarding said shared resource con- 
trol file, creating a new shared resource control file 
and writing master process identification information 
so to said shared resource control file, and writing a 
timestamp to said control f ile; if said requesting proc- 
ess is a master process for said shared resource, re- 
placing the timestamp in said shared resource control 
file with a current timestamp after a preset second in- 
55 terval has passed. 

Thus the present invention is directed to an im- 
proved system and method for managing a distribut- 
ed processing system. The system for designating a 
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master process for a data object is improved to reduce 
the length of time exclusive control of a master proc- 
ess indicator is required while maintaining the ability 
of the indicator to show master status. The master 
process that wins the race for control holds temporary 
write access to a control indicator to update its master 
data and then relinquishes control. Periodically there- 
after, the master process updates the control indica- 
tor. Any other process accessing the control indicator 
will know that another process is master unless the 
last master update is stale. In that case, the surviving 
processes again race for control. 

The present invention includes a process of de- 
termining a master process for control of a shared re- 
source in a computer system having a number of proc- 
esses operating on at least one processor that has a 
memory and access to a shared data storage means. 
The process includes testing the shared data storage 
means for the presence of a shared resource control 
file for the shared resource; if no file exists, creating 
a shared resource control file in the shared data stor- 
age means, writing master process identification in- 
formation to the shared resource control file, and writ- 
ing a timestamp to the control file; if a file exists, re- 
questing exclusive access to the file; if access de- 
nied, waiting and retrying; if access granted, deter- 
mining the difference between current time and the 
last time stamp; if the difference is less than a first 
preset interval, designating the requesting process 
as a shadow process; if the difference is greater than 
the first preset interval, discarding the shared re- 
source control file, creating a new shared resource 
control file and writing master process identification 
information to the shared resource control file, and 
writing a timestamp to the control file; if the request- 
ing process is a master process for the shared re- 
source, replacing the timestamp in the shared re- 
source control file with a current timestamp after a 
preset second interval has passed. 

It is an object of the present invention to improve 
master process efficiency by reducing the length of 
time exclusive control over a master process indicator 
is required. 

A preferred embodiment of the invention will now 
be described, by way of example only, with reference 
to the accompanying drawings in which: 

Figure 1 is a block diagram of a computer system 
of the type in which the present invention is embod- 
ied. 

Figure 2 is a block diagram of a distributed net- 
work according to the present invention. 

Figure 3 is a flowchart depicting the master reso- 
lution logic of an alternative system. 

Figure 4 is a flowchart depicting the master reso- 
lution logic of the preferred embodiment of the pres- 
ent invention. 

The present invention is practised in a distributed 
processing computer environment. This environment 



consists of a number of computer processors linked 
together by a communications network. Alternatively, 
the present invention could be practised in a multi- 
programming system in which a single computer ( .g. 
5 single CPU) supports the execution of multiple proc- 
esses each having a separate address space. 

The preferred embodiment is practised with 
linked computers. Each computer typically has the 
components shown generally for the system 100 in 
10 Figure 1. Processing is provided by central process- 
ing unit or CPU 102. CPU 102 acts on instruction and 
data stored in random access memory 104. Long 
term storage is provided on one or more disks 122 op- 
erated by disk controller 120. A variety of other star- 
ts age media could be employed including tape, CD- 
ROM, or WORM drives. Removable storage media 
may also be provided to store data or computer proc- 
ess instructions. Operators communicate with the 
system through I/O devices controlled by I/O control- 

20 ler 112. Display 114 presents data to the operator 
while keyboard 114 and pointing device 118 allow the 
operator to direct the computer system. Communica- 
tions adapter 106 controls communications between 
this processing unit and others on a network to which 

25 it connected by network interface 108. 

Computer system 1 00 can be any known comput- 
er system including microcomputers, mini-computers 
and mainframe computers. The preferred embodi- 
ment envisions the use of computer systems such as 

30 the IBM Personal System/2 (PS/2) or IBM RISC Sys- 
tem/6000 families of computers. (IBM, Personal Sys- 
tem/2, PS/2 and RISC System/6000 are trademarks 
of the IBM Corp.) However, workstations from other 
vendors such as Sun Microsystems, Inc. or Hewlett 

35 Packard may be used, as well as computers from 
Compaq Computer Corp. or Apple Computer Corp. 

A distributed processing system is shown in Fig- 
ure 2. Each of the processing nodes 202, 204, 206, 
208, 210 is connected to a network 200 that enables 

40 communications among the processors. Additional 
permanent storage may be associated with the net- 
work as shown by disk storage unit 212. In the alter- 
native, persistent storage in one of the processing 
nodes could be used for network persistent storage. 

45 Network 200 can be any type of network includ- 

ing LAN, WAN, ATM or other. Physical network pro- 
tocols such as Ethernet or Token Ring can be used 
and communications protocols such as TCP/IP or 
Netbios or Novell Netware can control the network. 

50 Network file system management can be provided by 
a program based on the Sun Microsystems NFS tech- 
nology or CMU AFS technology. Each of these file 
system programs allows distributed processes to ac- 
cess and manage data residing on remote systems. 

55 These systems create a single logical file system for 
each processor regardless of the physical location of 
individual files. NFS is described in greater detail in 
the IBM Corp. publication Communication Concepts 
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and Procedures , Order No. SC23-2203-00. 

The variety of permitted networks means that the 
processing nodes may be distributed throughout a 
building, across a campus, or ev n across national 
boundaries. 5 

The preferred embodiment of the present inven- 
tion is practised in a distributed network of peer proc- 
essing nodes. Peer nodes each have equal status in 
the network with none being master or slave nodes. 
Using peer nodes improves network efficiency be- 10 
cause there is no single bottleneck through which re- 
quests must be funnelled. Instead each node can act 
independently to perform its functions. Another ad- 
vantage is that failure of any particular node will not 
cause the entire network to fail as would be the case 1 5 
where a master processor existed. The disadvantage 
of peer networks is that there is no focal point for con- 
trolling data integrity of replicated data. 

The above referenced patent application for De- 
termining the Winner of a Race in a Data Processing 20 
System , filed herewith, teaches a procedure for "rac- 
ing" for control of a resource. Figure 3 illustrates the 
steps of this process. The process starts by generat- 
ing a request for a common resource 150. The proc- 
ess requesting the resource tests to determine 25 
whether or not a shared control file exists 1 52. If not, 
the process creates a shared control file 1 54. In either 
case, the process attempts to hold exclusive write ac- 
c ss 156. If this is successful 158 the process up- 
dates the shared control file 160 and it becomes mas- 30 
t r of that resource 162. If the attempt to acquire ex- 
clusive write failed, the process is not the master 164 
and must read the name of the master from the 
shared control file 1 66 and connect to the master 1 68 
as a shadow 170. If the requesting process is the 35 
master, it can directly access the resource, other- 
wise, it is a shadow process and must negotiate with 
the master for access 176. 

The shared control file of the preferred embodi- 
ment is a storage file in the logical file system. As 40 
such, it resides on one of the permanent storage de- 
vices in the distributed system. The present invention 
is equally applicable, however, to a shared resource 
control file managed in volatile memory (RAM) that is 
sharable among the distributed processes. 45 

The requirement that the master process main- 
tain an exclusive write lock 156 on the shared control 
file is undesirable in many systems. Each process 
may be master of a number of resources. Each repli- 
cated data object has a master and a particular user so 
may cause a single process to be master of a large 
number of data objects. Each exclusive write lock ties 
up a process file descriptor. In many operating sys- 
tems, the number of file descriptors allocated to each 
process is limited. For example, older versions of the 55 
UNIX operating system (UNIX is a registered trade- 
mark of Unix System Laboratories, Inc) allowed only 
four or five open file descriptors per process. Thus, a 



particular process may be restricted in accessing re- 
sources because of a limit on file descriptors. 

The present invention is directed to removing that 
limitation by allowing the master to release the exclu- 
sive write mode while still being the master process 
for that resource. The previous system indicated race 
failures to shadows by denying them the exclusive 
write access to the shared control file. The present in- 
vention replaces this master status indicator with a 
timestamp and control file age check. This change al- 
lows an unlimited number of replicated objects for 
each process. 

The selection of the master process (the winner 
of the race) in the preferred embodiment of the pres- 
ent invention will be described with reference to Fig- 
ure 4 in which reference numbers corresponding to 
the reference numbers of Figure 3 indicate equivalent 
process steps. 

The process starts when a processor requests a 
common resource 150. The existence of a shared 
control file is tested 152. If no shared control file ex- 
ists, the process creates one 154, obtains exclusive 
access, and writes identifying data including the mas- 
ter identity and a timestamp 180. Processing contin- 
ues at test 172 where status is checked prior to ac- 
cess. 

When a shared control file exists, the process at- 
tempts to gain exclusive write access to the file 1 56*. 
Failure to gain access means another process has 
exclusive access to the shared control file. In the pre- 
ferred embodiment, either the master or shadow 
could have exclusive access to the process must re- 
try 156' until it actually acquires exclusive access to 
check its status. 

Success in this case does not assure the process 
that it is master. Instead, the process must read the 
timestamp value from the shared control file 182 and 
compare it to the current time 1 84. If the difference 
between the current time and the time stamp is less 
than a set period INTER_BEAT the designated mas- 
ter process is still in control and the requesting proc- 
ess is a shadow process 186. If the difference is 
greater than INTER_BEAT then the shared control 
file is stale. The requesting process discards the old 
shared control file 188 and creates a new one in 
which it writes its own master process identification 
and timestamp 189. Processing continues at step 
172. 

The process requesting resource access tests 
whether it is the master of that resource at 172. If it is 
the master, it may access the resource 174. If not, it 
is a shadow process and must negotiate for access 
1 76. The master process must continually update the 
shared control f ile to maintain control as the master. 
Every HEARTBEAT seconds 190 the master proc- 
ess attempts to obtain exclusive access to the shared 
control file. The request may fail 193 due to a shadow 
process holding exclusive access to check master 
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status. If the request fails, the master process waits 
and tries again. If the request succeeds, the master 
replaces the times tamp. 

The periods INTER_BEAT and HEART_BEAT 
are set by the application developer. INTER_BEAT - 5 
HEART_BEAT must be greater than the expected 
wait time for the master process to gain exclusive 
write access in step 192. If it is too short, the shared 
control file will be prematurely invalidated. On the 
other hand, if INTER_BEAT is too long transfer of 10 
control to one of the shadow processes will be de- 
layed. The preferred embodiment sets HEARTBEAT 
at 30 seconds and INTER_BEAT at 90 seconds. 
These values ensure that the process does not tie up 
system resources in frequent timestamp updates, but 15 
provides sufficiently prompt discovery of master 
process failure. 

The system of the present invention has the ad- 
vantage of reducing the length of time a process must 
hold an exclusive lock. It also assures that failure of 20 
a master process can be detected and another master 
established without undue delay. 

It will be understood from the foregoing descrip- 
tion that various modifications and changes may be 
made in the preferred embodiment of the present in- 25 
vention. It is intended that this description is for pur- 
poses of illustration only and should not be construed 
in a limiting sense. The scope of this invention should 
be limited only by the language of the following 
claims. 30 



Claims 

1. A method of determining a master process for 35 
control of a shared resource in a computer sys- 
tem having a plurality of processes operating on 
at least one processor that has a memory and ac- 
cess to a shared data storage means, the method 
comprising: 40 

testing said shared data storage means for 
the presence of a shared resource control file for 
said shared resource; 

if no file exists, creating a shared resource 
control file in said shared data storage means, 45 
writing master process identification information 
to said shared resource control file, and writing a 
timestamp to said control file; 

if a file exists, requesting exclusive ac- 
cess to said file; 50 

if access denied, waiting and retrying; 

if access granted, determining the differ- 
ence between current time and the last time 
stamp; 

if said difference is less than a first preset 55 
interval, designating the requesting process as a 
shadow process; 

if said difference is greater than said first 



preset interval, discarding said shared resource 
control file, creating a new shared resource con- 
trol file and writing master process identification 
information to said shared resource control file, 
and writing a timestamp to said control file; 

if said requesting process is a master 
process for said shared resource, replacing the 
timestamp in said shared resource control fil 
with a current timestamp after a preset second in- 
terval has passed. 

2. A method as claimed in claim 1 wherein said 
shared data storage means is a network file sys- 
tem. 

3. A method as claimed in claim 1 wherein said 
shared data storage means is shared volatile 
memory. 

4. A system for controlling sharing of a plurality of 
shared resources in a distributed processing sys- 
tem having a plurality of processors connected by 
a communications means, the system compris- 
ing: 

shared resource control means for storing 
data indicating a process as a controlling process 
for each of said plurality of shared resources and 
data indicating a time of a last update of said 
shared resource control means by said control- 
ling process; 

access control means for limiting exclu- 
sive access to said shared resource control 
means to at most one process; 

shared resource control read means for 
reading data from said shared resource control 
means; 

comparison means for comparing said 
data indicating the time of the last update and the 
current time to determine an elapsed interval; 

release means for releasing said shared 
resource control file if said elapsed interval is 
greater than a first value; 

update means for periodically causing said 
controlling process to replace said data indicating 
the time of the last update with the current time. 
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(g) A system and method for determining a mas- 
ter process for control of a shared system re- 
source. The improved system requires the 
master process to hold exclusive access on a 
shared resource control file only intermittently. 
The master process periodically updates the 
shared resource control file with a new times- 
tamp. Processes seeking resource access read 
the shared control file and determine whether 
another process has been designated master. If 
the interval since the latest timestamp is greater 
than a preset staieness interval, the shared 
control file is discarded and a new one created 
by the accessing process. 



150 — REQUEST COMMON RESOURCE | 
152 



154 



2l 



MO /SHARED 
CMTLFB-E 



CREATE SCF| 



2l 




WRfTESCF 

MFO& 
TIMESTAMP 



| REQUEST EXCi ACCESS | w— 156* 




YES 



DISCARD SCF 



y 



189 



RECREATE SCP 





172. 




YES v 


174 






ACCESS 




NEGOTIATE ACCESS 



17S 



CO 

< 



o 
a> 

CM 
CD 



0- 
UJ 



192 . 




M OBTAW EXCL ACCESS 
193- 



SUCCESS 
? 




V4 RgPtACE TIME STAMP | 

t 



RG. 4 



EP 0 629 947 A3 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



iNaabcr 

EP 94 30 4097 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of c 



CLASSIFICATION OP THE 
APPLICATION (toldS) 



A 
A 



IBM TECHNICAL DISCLOSURE BULLETIN, 

vol. 34, no. 9, February 1992 NEW YORK, US, 

pages 246*250, 

ANONYMOUS 'Technique for Fast Query of a 
LAN Based Database. 1 

* the whole document * 

EP-A-0 465 871 (IBM) 15 January 1992 

* the whole document * 

US-A-4 961 224 (YUNG DARBY) 2 October 1990 

* the whole document * 



1-4 



G06F9/46 



1-4 
1-4 



SEARCHED OntXLS) 



G06F 



The 



i drawn op for all 



Vtaca *f tearck 

THE HAGUE 



D*m mt tm^kelim mt ta» mmr± 

10 April 1995 



Michel, T 



CATEGORY OF CITED DOCUMENTS 



X : eartiailariy relevant If takea sloae 

Y : particularly relevant If con Mb erf with mother 

•ocajaent of the same c at ego r y 
A : fcanotagUal teckcrooa4 

0:i 
P:i 



T : theory or prtodple uatcrtytDg the tnventioo 
E : mzUm attest tecanwnt, but paNbbea' on. or 

after tho filing est* 
D : QDCttOMPt dtee la the aeoUentoo 
L> : aocaaMDt chel tor other reasons 



nfcer of the sum patent tally, 



